Using VTLN for broadcast news transcription
نویسندگان
چکیده
Vocal tract length normalisation (VTLN) is a commonly used speaker normalisation approach. It is attractive compared to many normalisation schemes as it is typically dependent on only a single parameter, allowing the warp factors to be robustly calculated on little data. However, the scheme normally requires explicitly coding the data at multiple warp factors. Furthermore, it is only possible to approximate the Jacobian associated with the VTLN transformation. A new, simple, linear approximation to VTLN is described in this paper. This linear approximation allows the Jacobian to be exactly computed. It can also be highly efficient in terms of warp factor estimation and application of the warp factors. Both the linear and standard CUED VTLN schemes were evaluated in the 2003 BNE evaluation framework and found to yield similar performance. When used in system combination both VTLN schemes yielded slight gains over the baseline system.
منابع مشابه
Toward Automatic Recognition of Japanese Broadcast News
In this paper we report on automatic recognition of Japanese broadcast-news speech. We have been working on largevocabulary continuous speech recognition (LVCSR) for Japanese newspaper speech transcription and achieved reasonably good performance. We have recently applied our LVCSR system to transcribing Japanese broadcast-news speech. We extended the vocabulary to 20k words and trained the lan...
متن کاملToward automatic transcription of Japanese broadcast news
In this paper, we report on the automatic recognition of Japanese broadcast-news speech. We have been working on largevocabulary continuous speech recognition (LVCSR) for Japanese newspaper speech transcription and have achieved good performance. We have recently applied our LVCSR system to transcribing Japanese broadcast-news speech. We extended the vocabulary from 7k words to 20k words and tr...
متن کاملUsing VTLN matrices for rapid and computationally-efficient speaker adaptation with robustness to first-pass transcription errors
In this paper, we propose to combine the rapid adaptation capability of conventional Vocal Tract Length Normalization (VTLN) with the computational efficiency of transform-based adaptation such as MLLR or CMLLR. VTLN requires the estimation of only one parameter and is, therefore, most suited for the cases where there is little adaptation data (i.e. rapid adaptation). In contrast, transform-bas...
متن کاملUnsupervised language model adaptation for Mandarin broadcast conversation transcription
This paper investigates unsupervised language model adaptation on a new task of Mandarin broadcast conversation transcription. It was found that N-gram adaptation yields 1.1% absolute character error rate gain and continuous space language model adaptation done with PLSA and LDA brings 1.3% absolute gain. Moreover, using broadcast news language model alone trained on large data under-performs a...
متن کاملOnline Temporal Language Model Adaptation for a Thai Broadcast News Transcription System
This paper investigates the effectiveness of online temporal language model adaptation when applied to a Thai broadcast news transcription task. Our adaptation scheme works as follow: first an initial language model is trained with broadcast news transcription available during the development period. Then the language model is adapted over time with more recent broadcast news transcription and ...
متن کامل